41 research outputs found

    Artificial Neural Network Pruning to Extract Knowledge

    Full text link
    Artificial Neural Networks (NN) are widely used for solving complex problems from medical diagnostics to face recognition. Despite notable successes, the main disadvantages of NN are also well known: the risk of overfitting, lack of explainability (inability to extract algorithms from trained NN), and high consumption of computing resources. Determining the appropriate specific NN structure for each problem can help overcome these difficulties: Too poor NN cannot be successfully trained, but too rich NN gives unexplainable results and may have a high chance of overfitting. Reducing precision of NN parameters simplifies the implementation of these NN, saves computing resources, and makes the NN skills more transparent. This paper lists the basic NN simplification problems and controlled pruning procedures to solve these problems. All the described pruning procedures can be implemented in one framework. The developed procedures, in particular, find the optimal structure of NN for each task, measure the influence of each input signal and NN parameter, and provide a detailed verbal description of the algorithms and skills of NN. The described methods are illustrated by a simple example: the generation of explicit algorithms for predicting the results of the US presidential election.Comment: IJCNN 202

    Fractional norms and quasinorms do not help to overcome the curse of dimensionality

    Full text link
    The curse of dimensionality causes the well-known and widely discussed problems for machine learning methods. There is a hypothesis that using of the Manhattan distance and even fractional quasinorms lp (for p less than 1) can help to overcome the curse of dimensionality in classification problems. In this study, we systematically test this hypothesis. We confirm that fractional quasinorms have a greater relative contrast or coefficient of variation than the Euclidean norm l2, but we also demonstrate that the distance concentration shows qualitatively the same behaviour for all tested norms and quasinorms and the difference between them decays as dimension tends to infinity. Estimation of classification quality for kNN based on different norms and quasinorms shows that a greater relative contrast does not mean better classifier performance and the worst performance for different databases was shown by different norms (quasinorms). A systematic comparison shows that the difference of the performance of kNN based on lp for p=2, 1, and 0.5 is statistically insignificant

    Long and short range multi-locus QTL interactions in a complex trait of yeast

    Full text link
    We analyse interactions of Quantitative Trait Loci (QTL) in heat selected yeast by comparing them to an unselected pool of random individuals. Here we re-examine data on individual F12 progeny selected for heat tolerance, which have been genotyped at 25 locations identified by sequencing a selected pool [Parts, L., Cubillos, F. A., Warringer, J., Jain, K., Salinas, F., Bumpstead, S. J., Molin, M., Zia, A., Simpson, J. T., Quail, M. A., Moses, A., Louis, E. J., Durbin, R., and Liti, G. (2011). Genome research, 21(7), 1131-1138]. 960 individuals were genotyped at these locations and multi-locus genotype frequencies were compared to 172 sequenced individuals from the original unselected pool (a control group). Various non-random associations were found across the genome, both within chromosomes and between chromosomes. Some of the non-random associations are likely due to retention of linkage disequilibrium in the F12 population, however many, including the inter-chromosomal interactions, must be due to genetic interactions in heat tolerance. One region of particular interest involves 3 linked loci on chromosome IV where the central variant responsible for heat tolerance is antagonistic, coming from the heat sensitive parent and the flanking ones are from the more heat tolerant parent. The 3-locus haplotypes in the selected individuals represent a highly biased sample of the population haplotypes with rare double recombinants in high frequency. These were missed in the original analysis and would never be seen without the multigenerational approach. We show that a statistical analysis of entropy and information gain in genotypes of a selected population can reveal further interactions than previously seen. Importantly this must be done in comparison to the unselected population's genotypes to account for inherent biases in the original population

    Quasi-orthogonality and intrinsic dimensions as measures of learning and generalisation

    Full text link
    Finding best architectures of learning machines, such as deep neural networks, is a well-known technical and theoretical challenge. Recent work by Mellor et al (2021) showed that there may exist correlations between the accuracies of trained networks and the values of some easily computable measures defined on randomly initialised networks which may enable to search tens of thousands of neural architectures without training. Mellor et al used the Hamming distance evaluated over all ReLU neurons as such a measure. Motivated by these findings, in our work, we ask the question of the existence of other and perhaps more principled measures which could be used as determinants of success of a given neural architecture. In particular, we examine, if the dimensionality and quasi-orthogonality of neural networks' feature space could be correlated with the network's performance after training. We showed, using the setup as in Mellor et al, that dimensionality and quasi-orthogonality may jointly serve as network's performance discriminants. In addition to offering new opportunities to accelerate neural architecture search, our findings suggest important relationships between the networks' final performance and properties of their randomly initialised feature spaces: data dimension and quasi-orthogonality

    Robust And Scalable Learning Of Complex Dataset Topologies Via Elpigraph

    Full text link
    Large datasets represented by multidimensional data point clouds often possess non-trivial distributions with branching trajectories and excluded regions, with the recent single-cell transcriptomic studies of developing embryo being notable examples. Reducing the complexity and producing compact and interpretable representations of such data remains a challenging task. Most of the existing computational methods are based on exploring the local data point neighbourhood relations, a step that can perform poorly in the case of multidimensional and noisy data. Here we present ElPiGraph, a scalable and robust method for approximation of datasets with complex structures which does not require computing the complete data distance matrix or the data point neighbourhood graph. This method is able to withstand high levels of noise and is capable of approximating complex topologies via principal graph ensembles that can be combined into a consensus principal graph. ElPiGraph deals efficiently with large and complex datasets in various fields from biology, where it can be used to infer gene dynamics from single-cell RNA-Seq, to astronomy, where it can be used to explore complex structures in the distribution of galaxies.Comment: 32 pages, 14 figure

    Personality Traits and Drug Consumption. A Story Told by Data

    Full text link
    This is a preprint version of the first book from the series: "Stories told by data". In this book a story is told about the psychological traits associated with drug consumption. The book includes: - A review of published works on the psychological profiles of drug users. - Analysis of a new original database with information on 1885 respondents and usage of 18 drugs. (Database is available online.) - An introductory description of the data mining and machine learning methods used for the analysis of this dataset. - The demonstration that the personality traits (five factor model, impulsivity, and sensation seeking), together with simple demographic data, give the possibility of predicting the risk of consumption of individual drugs with sensitivity and specificity above 70% for most drugs. - The analysis of correlations of use of different substances and the description of the groups of drugs with correlated use (correlation pleiades). - Proof of significant differences of personality profiles for users of different drugs. This is explicitly proved for benzodiazepines, ecstasy, and heroin. - Tables of personality profiles for users and non-users of 18 substances. The book is aimed at advanced undergraduates or first-year PhD students, as well as researchers and practitioners. No previous knowledge of machine learning, advanced data mining concepts or modern psychology of personality is assumed. For more detailed introduction into statistical methods we recommend several undergraduate textbooks. Familiarity with basic statistics and some experience in the use of probabilities would be helpful as well as some basic technical understanding of psychology.Comment: A preprint version prepared by the authors before the Springer editorial work. 124 pages, 27 figures, 63 tables, bibl. 24

    Metallome of cerebrovascular endothelial cells infected with Toxoplasma gondii using μ-XRF imaging and inductively coupled plasma mass spectrometry

    Get PDF
    In this study, we measured the levels of elements in human brain microvascular endothelial cells (ECs) infected with T. gondii. ECs were infected with tachyzoites of the RH strain, and at 6, 24, and 48 hours post infection (hpi), the intracellular concentrations of elements were determined using a synchrotron–microfocus X-ray fluorescence microscopy (μ-XRF) system. This method enabled the quantification of the concentrations of Zn and Ca in infected and uninfected (control) ECs at sub-micron spatial resolution. T. gondii-hosting ECs contained less Zn than uninfected cells only at 48 hpi (p 0.05). Inductively Coupled Plasma Mass Spectrometry (ICP-MS) analysis revealed infection-specific metallome profiles characterized by significant increases in the intracellular levels of Zn, Fe, Mn and Cu at 48 hpi (p < 0.01), and significant reductions in the extracellular concentrations of Co, Cu, Mo, V, and Ag at 24 hpi (p < 0.05) compared with control cells. Zn constituted the largest part (74%) of the total metal composition (metallome) of the parasite. Gene expression analysis showed infection-specific upregulation in the expression of five genes, MT1JP, MT1M, MT1E, MT1F, and MT1X, belonging to the metallothionein gene family. These results point to a possible correlation between T. gondii infection and increased expression of MT1 isoforms and altered intracellular levels of elements, especially Zn and Fe. Taken together, a combined μ-XRF and ICP-MS approach is promising for studies of the role of elements in mediating host–parasite interaction

    Fibro-inflammatory recovery and type 2 diabetes remission following a low calorie diet but not exercise training: A secondary analysis of the DIASTOLIC randomised controlled trial

    Get PDF
    AimsTo investigate the relationship between fibro-inflammatory biomarkers and cardiovascular structure/function in people with Type 2 Diabetes (T2D) compared to healthy controls and the effect of two lifestyle interventions in T2D.MethodsData were derived from the DIASTOLIC randomised controlled trial (RCT) and includes a comparison between those with T2D and the matched healthy volunteers recruited at baseline. Adults with T2D without cardiovascular disease (CVD) were randomized to a 12-week intervention either: (1) exercise training, (2) a low-energy (∼810 kcal/day) meal-replacement plan (MRP) or (3) standard care. Principal Component and Fisher's linear discriminant analysis were used to investigate the relationships between MRI acquired cardiovascular outcomes and fibro-inflammatory biomarkers in cases versus controls and pre- and post-intervention in T2D.ResultsAt baseline, 83 people with T2D (mean age 50.5 ± 6.4; 58% male) and 36 healthy controls (mean age 48.6 ± 6.2; 53% male) were compared and 76 people with T2D completed the RCT for pre- post-analysis. Compared to healthy controls, subjects with T2D had adverse cardiovascular remodelling and a fibro-inflammatory profile (20 differentially expressed biomarkers). The 3D data visualisations showed almost complete separation between healthy controls and those with T2D, and a marked shift towards healthy controls following the MRP (15 biomarkers significantly changed) but not exercise training.ConclusionsFibro-inflammatory pathways and cardiovascular structure/function are adversely altered before the onset of symptomatic CVD in middle-aged adults with T2D. The MRP improved the fibro-inflammatory profile of people with T2D towards a more healthy status. Long-term studies are required to assess whether these changes lead to continued reverse cardiac remodelling and prevent CVD

    A systematic autopsy survey of human infant bridging veins

    Get PDF
    In the first years of life, subdural haemorrhage (SDH) within the cranial cavity can occur through accidental and non-accidental mechanisms as well as from birth-related injury. This type of bleeding is the most common finding in victims of abusive head trauma (AHT). Historically, the most frequent cause of SDHs in infancy is suggested to be traumatic damage to bridging veins traversing from the brain to the dural membrane. However, several alternative hypotheses have been suggested for the cause and origin of subdural bleeding. It has also been suggested by some that bridging veins are too large to rupture through the forces associated with AHT. To date, there have been no systematic anatomical studies on infant bridging veins. During 43 neonatal, infant and young child post-mortem examinations, we have mapped the locations and numbers of bridging veins onto a 3D model of the surface of a representative infant brain. We have also recorded the in situ diameter of 79 bridging veins from two neonatal, one infant and two young children at post-mortem examination. Large numbers of veins, both distant from and directly entering the dural venous sinuses, were discovered travelling between the brain and dural membrane, with the mean number of veins per brain being 54.1 and the largest number recorded as 94. The mean diameter of the bridging veins was 0.93 mm, with measurements ranging from 0.05 to 3.07 mm. These data demonstrate that some veins are extremely small and subjectively, and they appear to be delicate. Characterisation of infant bridging veins will contribute to the current understanding of potential vascular sources of subdural bleeding and could also be used to further develop computational models of infant head injury
    corecore